GRACE TR 2015 01 テクニカルレポート | GRACEセンター

(1)

ISSN 1884-0760

GRACE TECHNICAL REPORTS

Graph Transformation as Graph Reduction

FUnCAL: A Functional Reformulation of

Graph-Transformation Language UnCAL

Kazutaka Matsuda Kazuyuki Asada

(Communicated by Zhenjiang Hu)

GRACE-TR 2015-01

January 2015

CENTER FOR GLOBAL RESEARCH IN

(2)

(3)

Graph Transformation as Graph Reduction

FUnCAL: A Functional Reformulation of Graph-Transformation Language UnCAL

Kazutaka Matsuda

The University of Tokyo

[email protected]

Kazuyuki Asada

The University of Tokyo

[email protected]

Abstract

A large amount of graph-structured data are widely used, includ-ing biological database, XML with IDREFs, WWW, and UML di-agrams in software engineering. UnCAL, proposed by Buneman et al. from the database community, is a language designed forgraph transformations, i.e., extracting a subpart of a graph data and con-verting it to a suitable form, which often also has a graph structure. A distinguished feature of UnCAL is its semantics that respects

bisimulation on graphs; this enables us to reason about UnCAL graph transformations as recursive functions, which is useful for verification as well as optimization.

However, despite of this similarity of UnCAL to functional languages, there is still a gap to apply the program-manipulation techniques studied in the programming language literature directly to UnCAL programs, due to some special features in UnCAL, especiallymarkers.

In this paper, first, we give a translation from UnCAL pro-grams to functional ones by emulating markers by tuples andλ -abstractions, so that we can reason about UnCAL programs through functional ones. Thanks to the translation, we can import several verification results designed for functional programs to the UnCAL transformations. Second, to optimize UnCAL graph transforma-tions as functional programs, we give a memoized lazy semantics and a type system so that a well-typed functional program termi-nates and results in a finite graph, under the semantics; that is, well-typed functional programs are graph transformations. Thanks to the semantics and the type system, we can optimize a translated func-tional program freely as long as the optimization keeps typability, and execute it as a graph transformation.

Keywords Graph Transformation, Functional Languages, Lazy

Evaluation, Bisimulation, Regular Trees, Termination, Memoiza-tion

1. Introduction

A large amount of graph-structured data are widely used, includ-ing biological information, XML with IDREFs, WWW, UML di-agrams in software engineering [18], and Object Exchange Model (OEM) for exchanging arbitrary database structures [37]. In such circumstances, several languages, such as UnQL/UnCAL [8], Lorel [1], GraphLog [9], have been proposed mainly from the

database community forgraph transformation or querying over such graph-structured data—extracting a subpart of a graph and converting it to some suitable form—similarly to what XQuery does for XMLs. For example, we want to extract the orders by “Tanaka” from the graph in Figure 1 containing information of customers and their orders to obtain the graph in Figure 2.

UnCAL is a prominent language designed for graph transforma-tions [8]. Among its other nice features such as termination guar-antee and efficient execution byε-edges [8], the most characteristic feature of UnCAL is its semantics that respectsbisimulation, under which a graph and the infinite tree obtained by unfolding sharings and cycles are equivalent. Thanks to the bisimulation-respecting semantics, UnCAL supports functional-programming-style reason-ing: one can reason about UnCAL graph transformations as recur-sive functions that generate infinite trees, which is useful for verifi-cation [24] as well as optimization [8, 21].

However, despite of this similarity of UnCAL to functional languages, there is still a gap between UnCAL programs and func-tional ones, which will be explained in more detail in Section 1.1. Due to the gap, it is hard to apply program-manipulation tech-niques studied in the programming language literature directly to UnCAL programs. This is unfortunate to both communities; the database community cannot enjoy well-studied programming-language techniques, and the programming-programming-language community loses chances to contribute to the other community. Actually, sev-eral methods have been proposed for UnCAL while there already have been similar methods in the programming language literature. For example, the key technique in the optimization in [8, 21] is quite similar to the classic fold-fusion [32].

The purpose of this paper is to fill the gap between UnCAL and usual functional languages so that we can directly apply program-manipulation techniques studied in the programming-language community to the graph transformation problem. Specifically, in this paper, we give a translation from UnCAL programs to func-tional ones so that we can reason about, manipulate and execute UnCAL programs as functional ones.

1.1 Problem and Observation

The gap between UnCAL and usual functional languages, or what prevent us from directly importing techniques studied in the pro-gramming language community, ismarkersused to connect two graphs and to construct cycles. We have to cope with markers to seamlessly import functional results.

There are two usages of markers: inputand output. Roughly speaking, input markers are names for multiple-roots and output markers are names for holes. UnCAL also has expressions that connect nodes indicated by input markers (input nodes) and those indicated by output markers (output nodes) of the same names.

(4)

1003 "16/12"

1002 "16/10" 1001 "16/07" "IPL of Tokyo" "100-888" contractual "BiG office" "200-777" shipping

no date order_of order

add email name add

info code type "kato@biglab" "Kato" no date order_of order order

add email email name

no date order_of

info code type "tanaka@gmail" "tanaka@biglab" "Tanaka" customer customer

Figure 1. A graph data containing order information in customer-oriented representation.

order order

customer_name addr date no

"Tanaka" type code info

shipping "200-777" "BiG office" "16/07" 1001 "16/10" 1002

Figure 2. A graph data obtained from Figure 1 by extracting orders by “Tanaka” in order-oriented representation.

markers, graphs in UnCAL are similar to records, as below.

{name:Alice,email:alice}

Markers are used as an interface for connecting other graphs. In the following graph, we can connect something to output node&x.

{name:Alice,friend:&x}

A graph to be substituted to the output node must have the corre-sponding input maker, which can be assigned by⊲as below.

&x ⊲{name:Bob,friend:&y}

Then, we can connect the two graphs by@; for example, by writing

@(&x ⊲{name:Bob,friend:&y})

we get the following graph.

{name:Alice,friend:{name:Bob,friend:&y}}

Cyclic graphs can be constructed bycycle(g)that connects input nodes and output nodes ing, as follows.

cycle

&y ⊲

@(&x ⊲{name:Bob,friend:&y})

Then, we obtain a cyclic graph that representsAliceandBobare friends of each other.

Graph transformations are written by usingsrec, a structural recursion on graphs. Thanks to the bisimulation-respecting seman-tics,sreccan be understood as if it were defined recursively as:

srec(e)({a1 :G1, . . . ,an:Gn}) =

(e(a1, G1)@srec(e)(G1))∪. . .∪(e(an, Gn)@srec(e)(Gn)) Here, ∪ is the record concatenation; actually {x:s,y:t} in UnCAL is a shorthand notation for{x:s} ∪ {y:t}. For exam-ple, the following UnCAL expression returns people namedBob in dbwheredb is a variable that stores a record of the form of

{person:p1, . . . ,person:pn}.

srec(λ( , p).

srec(λ(l, n).&r ⊲

ifl=namethen

srec(λ(l′_, ₎_._if_l′₌_Bob_then_{person_:_p_}_else_{}₎₍_n₎_∪_&_r

else

&r)(p))(db)

The output node&rrepresents the result of the recursive call of the second-outermostsrec.

One might notice that these behaviors of input/output markers,

@, andcyclecan be emulated byλ-abstractions andletrec. For example, the UnCAL expression above that constructs the cyclic graph can be written as below.

letrecy= (λx.{name:Alice,friend:x}) {name:Both,friend:y} iny

Also, one might notice that the behavior ofsreccan be expressed by a paramorphism [31]parathat behaves like:

paraf {a1:G1, . . . ,an:Gn}

= (fa1G1(paraf G1))∪. . .∪(fanGn(paraf Gn))

It would seem that reasoning and execution of UnCAL programs as functional ones would look straightforward.

However, the straightforward translation is not enough because the translation can map terminating UnCAL expressions to nonter-minating ones. This is problematic if we apply optimization tech-niques such as fusion [32] to UnCAL programs because we may not execute optimized translated programs, although the translation still is useful in reasoning of UnCAL programs. For example, the translation convertscycle(&x ⊲&x), which results in a singleton graph in UnCAL, toletrecx=xinx, which leads to an infinite loop in usual languages. Although the expressioncycle(&x ⊲&x)

itself is rarely seen in practice, a similar problem arises when we write graph transformations bysrec. For example, let us consider the following UnCAL expression that eliminates all the edges from

dband thus returns a singleton graph for anydb.

srec(λ( , ).&r ⊲&r)(db)

The transformation can be seen as a simplified version of the above transformation that searchesBob, in the sense that it models the behavior of the second-outermostsrecof the transformation when it is applied to a graph with nonames. Here comes a problem. The behavior of the transformation differs after the translation if we apply it to a cyclic graph like that obtained bycycle(&x ⊲

{a:&x}). The UnCAL expression

srec(λ( , ).&r ⊲&r)(cycle(&x ⊲{a:&x}))

terminates and returns a singleton graph while the corresponding functional program

(5)

goes into an infinite loop.

Another but related issue is that we want to obtainfinite graphs

as evaluation results, instead ofinfinite trees, because our goal is “graph” transformation.

In summary, we have to deal with these problems in order to apply the manipulation techniques studied in the program-ming language community to the graph transformation problem.

1.2 Contributions

In this paper, first, after a brief review of UnCAL (Section 3), we formalize a translation from UnCAL programs—which manipu-late finite graphs—to functional programs that manipumanipu-late infinite trees (Section 4). The translation just follows the idea shown in Section 1.1. The purpose of Section 4 is to clarify the relation-ship between UnCAL programs andusual functional programs. This translation is useful also to reason about UnCAL programs as functional ones, and then is useful to import verification tech-niques (Section 2.1).

Second, to optimize UnCAL programs as functional ones, we give a semantics (Section 5) and a type system (Section 6) for the target language of the translation; a well-typed functional pro-gram under the type system can be executed as afinite-graph trans-formation under the semantics withtermination guarantee (Sec-tion 7). We also show that the translated func(Sec-tional programs are well-typed, and that semantics in Section 4 and that in Section 5 “coincide”. Thanks to the type system, users can freely optimize translated programs and finally run them as graph transformations, as long as the optimization keeps typability (Section 2.2). Note that our semantics itself is not new and nothing special; it is just the lazy semantics [34] with the black hole [2, 3, 34] and memoization. This helps us to implement the semantics easily, which runs faster than the existing implementation of UnCAL [23] (Section 2.3). Also, this enables us to bidirectionalize [29, 30] of UnCAL transforma-tions (Section 2.4).

The main contributions of this paper are summarized as below:

•We formalize the transformation from UnCAL to functional ones to support reasoning of UnCAL programs as functional ones (Section 4).

•We give the semantics and the type system so that we can optimize the translated functional programs and execute them as graph transformations (Sections 5, 6 and 7).

•We show some applications of the proposed translation, seman-tics, and type system (Section 2).

2. Benefits

We start the paper with showing the benefits of our results, i.e., the translation from UnCAL to functional programs so that we can reason about UnCAL programs as functional ones, and the semantics and type system to support optimization and execution of UnCAL programs via functional ones. Also, our result enables us to import bidirectionalization techniques [29, 30, 41] studied for functional programs.

2.1 Verification

A verification problem of graph transformation is, given setsA

and B and a transformationf, to check if∀a ∈ A.f(a) ∈ B

holds or not. For XML transformations, these setsAand B are usually described in DTD, XML Schema, or RELAX NG. For model transformations seen in software engineering, they are often described in KM3 [26].

A few but interesting results are known for the verification prob-lem on UnCAL. Buneman et al. [7] represent graph schemata (A

andB above) again in graphs so that they can directly compute

the imagef(A) by simply applyingf to (a graph of)A. Inaba et al. [24] reduce the problem to the validity checking of monadic second-order logic (MSO) formulae whenAandBare also given in MSO (fragments that respect bisimilarity), with some type anno-tations to a programfby users.

Our translation from UnCAL programs to functional ones en-ables us to access alternative methods, because the translation also reduces the verification problem for UnCAL programs to that for functional ones, which manipulates infinite trees instead of graphs. For example, thanks to our translation, we can use a verification method by Unno et al. [40] for graph transformations; it is orig-inally designed for tree transformations written in (higher-order) functional programs where the trees can be infinite.

Although Inaba et al. [24]’s method is well tailored to UnCAL/UnQL and thus the benefits are rather small for the “current” UnCAL/UnQL, the advantage of our translation becomes clearer when we extend UnCAL/UnQL. For example, if we extend UnCAL/UnCAL to in-clude higher-order functions to improve the programmability as in [22], then Inaba et al. [24]’s method becomes no longer appli-cable due to the higher-order constructs. In contract, the method by Unno et al. [40] is applicable for such extensions because it originally targets higher-order functional programs.

2.2 Optimization

Optimization is an important issue also in graph transformation. There have been a few approaches for optimization of UnCAL pro-grams [8, 21]. The basic idea of these approaches is to elaborate the fact that UnCAL transformations respect bisimilarity and to rewrite

srecas if it were defined as a recursive function on infinite trees as mentioned in Section 1.1. In addition to this basic idea, Hidaka et al. [21] also focus on manipulations of markers; for example, for

e@(&x⊲e′), their transformation statically computes the

plugging-in operation by substitutplugging-ing&xinebye′, and sometimes replaces the expression toestatically or dynamically ifedoes not contain the output marker&x.

The relationship between these UnCAL-specific techniques and usual optimization techniques for functional programs becomes clearer by our translation. Since our translation mapssrectofold

on graphs as will be shown in Section 4, we can reinterpret the ba-sic idea of their optimization as a special case of the clasba-sical fold-fusion [32]. Since the expressione@(&x⊲e′)is converted to expres-sion(λx.e)e′_{by our translation, we can regard Hidaka et al. [21]’s}

optimization as simplification byβ-reduction. Both techniques are well understood in the programming language community.

In addition, our translation enables us to access heavier or lighter alternatives. To the translated programs from UnCAL, we can apply optimization methods safely, as long as the optimization preserves typability with respect to the type system in Section 6. For example, on the one hand, when the execution time will be more significant the compilation/optimization, we can use heavy but effective optimization methods such as supercompilation [38]. On the other hand, when the compilation time is as important as execution time, which is also typical in DB-querying, lighter-weight approaches such as short-cut fusion [17] and lightlighter-weight fusion [36] are preferable.

2.3 Implementation

(6)

|V| |E| Ours GRoundTram Speed Up

Class2RDB 55 73 0.041 0.16 3.9

PIM2PSM 46 58 0.005 0.032 6.4

C2O_Sel 25 45 0.003 0.014 4.7

a2d_xc S30k 30000 29999 0.86 3.7 4.3

a2d_xc M200 40000 80000 3.7 5.5 1.5

a2d_xc C200 201 40200 1.5 2.0 1.3

Table 1. Experimental results (running time is in CPU seconds).

We implemented our semantics in Section 5 as an embedded DSL on Haskell.1Although we cannot use GHC’s lazy evaluation mechanism directly mainly due to the memoization we use (Sec-tion 5), we can explicitly represent memoized computa(Sec-tion by us-ing a monad. It is expected that the overhead of graph operations in a naive implementation, i.e., the overhead of directly handling sets of nodes and edges, or more precisely(V, E, I, O)quadruples as will be shown in Section 3, disappears in the implementation.

We measured execution time of a few transformations and compared the execution time with GRoundTram [20, 21, 23]2, an UnCAL implementation using OCaml. GRoundTram is im-plemented basically according to the semantics that will be shown in Section 3, with some optimizations [8, 21]. The experiments were held on MacOSX 10.8.3 over MacBook Air 11-inch with 1.4 GHz Intel Core 2 Duo CPU and 4 GB memory. We used GHC 7.6.3 (with LLVM 3.3) for Haskell and ocaml 4.00.1 for OCaml. The purpose of the experiments is to measure how the seman-tics in Section 5 is useful to implement UnCAL graph opera-tions. Thus, we did not compile the tested programs and graphs; we used runhaskellinstead while we compiled our embedded library providing the primitive graph operations. The examined programs were: Class2RDB is a benchmarking model transfor-mation [5],PIM2PSM(from [21]) converts a platform independent model to a platform specific model, andC2O_Sel(from [21]) con-verts a customer-order database from a customer-oriented repre-sentation to a order-oriented reprerepre-sentation with some extraction,

anda2d_xc(from [20]) renamesatodand contractsc. The

pro-gram codes ofClass2RDB, PIM2PSM andC2O_Sel are mechan-ically generated; they are originally written in UnQL+ [23] and converted to UnCAL. For a2d_xc, we used the three graphs as input:S30kis a 30000-long sequence of _{◦ ◦} A . . . _◦

/

/ A

/

/ A

/

/ ,M200is

a lattice-like graph of40000nodes in which thei-th node connects to the(i+ 1)-th and(i+ 200)-th nodes modulo40000byA, and C200is a graph of201nodes in which every node connects to the other nodes byA.

Table 1 shows the experimental results. In all the examined cases, our implementation ran significantly faster than GRound-Tram. The main source of the speed-up would be that there is no overhead of manipulating nodes and edges, which are stored in Sets in ocaml, in our implementation. Since GRoundTram already equips lazy-like evaluation for some special cases [21], the laziness of the translated program would not contribute to the speed-up so much. However, the speed-up is more significant inClass2RDB,

PIM2PSMandC2O_Sel, which contain 114, 74 and 28 applications

ofsrecs, respectively.

2.4 Bidirectionalization

A bidirectional transformation is a pair of a usual transforma-tion of type S → V and a “backward” transformation of type

S → V → S that maps changes on the transformed data to the original data [15, 16, 20, 29, 30, 41]. A classic instance of

bidirec-1_{The implementation is available from}_{https://bitbucket.org/kztk/}

funcal

2_{http://www.biglab.org/download.html}

tional transformation is the “view updating problem” studied in the database community [4, 11], where a view is a result of a query (i.e., a transformation). Recently, bidirectional model transformation has been studied in the software engineering community to synchronize high-level model and low-level implementation [42, 43]. Bidirec-tional semantics of UnCAL [20] has been studied for this applica-tion because usually those models are represented by graphs.

Our translation enables us to access bidirectionalization meth-ods [29, 30, 41] studied for functional programs. In [29, 30], if a functionfhas polymorphic type

∀α.∀µ.PackM c α µ⇒T α→µ(T′α)

whereT andT′are type constructors for container-like datatypes, namely instances ofTraversable, andPackMc α µis a type class with the methods below3

new::c→α

liftO::Eqr⇒(c→r)→α→µ r

liftO2 ::Eqr⇒(c→c→r)→α→α→µ r

then we can derive a backward transformation of type T c →

T′c → T ccorresponding tofc,I :: T c → I(T′ c)whereα andµare instantiated bycand the identity monadI respectively.

It is rather straightforward to write an interpreter according to the semantics discussed in Section 5 that has the following type.

eval::PackM Lα µ⇒Exp →Envα→µ(Graphα)

Here,Expis a type for translated expressions, andEnvandGraph

respectively are types for environments and graphs polymorphic in graph labels. The idea to write such an interpreter is that (1) instead of using label constantadirectly we usenewa, and (2) instead of direct label comparisonl=aorl=l′, we use lifted comparisons

liftO(=a)lorliftO2(=)l l′.

There already exists a bidirectional semantics [20] of UnCAL. Although their method can handle insertions and deletions of edges in addition to label updates, it is very complex even for label updates while our approach is much simpler. Also, their method is not robust for language extension. For example, if we extend UnCAL/UnCAL to include higher-order functions, then Hidaka et al. [20]’s method becomes no longer applicable due to the higher-order constructs. In contrast, Matsuda and Wang [29, 30]’s method is still applicable because its only requirement is the polymorphic type above and thus it allows us to use higher-order functions in a forward transformation.

3. Brief Overview of UnCAL

In this section, we briefly overview UnCAL [8], a first-order func-tional programming language that manipulates graphs. UnCAL is an internal language of UnQL [8] and its extension UnQL+[23], in which users can write query like SQL; for example, the query that extracts a person namedBobshown in Section 1.1 can be written as follows.

select $p where

$p in $db, {name: $n, friend: $f} in $p, {$nameL: {}} in $n, $nameL = Bob

Once a query is written in UnQL/UnQL+, it is converted to an UnCAL program and then is executed. This means, our translation is also beneficial to UnQL/UnQL+. It is remarkable that, recently, UnCAL is applied to bidirectional model-driven software develop-ment [20, 23, 43], where a structure of a software is modeled as a graph in different levels of abstractions and their relationships are described by graph transformations.

3_{In the original paper,}_PackM_{has a method to lift}_{n-ary functions instead}

(7)

3.1 Graphs in UnCAL

UnCAL deals withmulti-rooted, directed, and edge-labeled graphs with no order on outgoing edges. The characteristic points of the UnCAL graphs are: (1) the UnCAL graphs can havemarkersthat indicate roots and holes, (2) the UnCAL graphs can haveε-edges that have similar behaviors toε-transitions in automata, and (3) the equivalence of the UnCAL graphs are defined by bisimulation.

As mentioned in Section 1.1, markers are used in the two ways:

inputandoutput. Input markers are names for multiple roots, and output markers are names for holes. Nodes may be marked with input and output markers (input nodesandoutput nodes), and they can be connected to produce other graphs; e.g., one can construct cycles by connecting nodes with input markers to that of output markers of the same names in the same graph.

UnCAL graphs can containε-edges representing “short-cuts”, similarly to theε-transitions in automata. For example, if a nodev

is connected to a nodeuby anε-edge, it means that the edges of

uare also edges ofv(the converse is not necessarily true because

vcan have other edges than thisε-edge). UnCAL usesε-edge to delay some graph operations for efficiency [8].

We define the UnCAL graphs formally. Let M be a set of markers and Lbe a certain set of labels. An UnCAL graph G

is a quadruple (V, E, I, O), where V is a set of nodes, E ⊆

V×(L∪ {ε})×V is a set of edges,I⊆ M ×V is a set of pairs of input markers and the corresponding input nodes, andO⊆V×M is a set of pairs of output nodes and associated output markers. In addition, we require that, for each marker&x∈ M, there is at most one nodevsuch that(&x, v) ∈ I. In other words,I is a partial function from markers to nodes and is sometimes denoted as such. For a singly-rooted graph, the default marker &is often used to indicate the root. We call the markers in the sets{&x|(&x, )∈I} and {&x|( ,&x)∈O} input and output markers, respectively. Throughout this paper, we fix the (denumerable) set of labelsL.

The equivalence between UnCAL graphs is defined by bisim-ulation extended withε-edges. Intuitively, two UnCAL graphs are equivalent if the infinite trees obtained by unfolding sharings and cycles are identical, after short-cutting all theε-edges. Let us de-fine the bisimilarity between graphs formally. We writev→l_u

if there is an edge(v, l, u) ∈Ebetween nodesv, u∈V in a graph

G= (V, E, I, O), and write→∗ε

for the reflexive transitive closure of→ε

. AbisimulationXbetween a graphG1 = (V1, E1, I1, O1)

andG2 = (V2, E2, I2, O2)is a relation satisfying the following

conditions: (1) if(v1, v2) ∈ X, for any path satisfyingv1 →∗ ε w1 →a u1 there is a path satisfyingv2

∗

→ε _w

2 →a u2 and

(u1, u2) ∈ X, and for any pathu2satisfyingv2

∗

→ε _w

2 →au2

there is a path satisfyingv1 →∗εw1→au1and(u1, u2)∈ X; (2)

if(v1, v2)∈ X, for any pathv1

∗

→ε _u

1such that(u1,&x)∈O,

there is a pathv2→∗ εu2such that(u2,&x)∈O, and conversely,

for any pathv2 →∗ε u2 such that(u2,&x) ∈ O, there is a path v1 →∗ε u1such that(u1,&x)∈O; and (3)dom(I1) = dom(I2)

and(I1(&x), I2(&x)) ∈ X for any&x ∈ dom(I1) = dom(I2).

Two graphsG1andG2are calledbisimilar, denoted byG1∼G2,

if there is a bisimulation betweenG1andG2.

Note that the graph bisimulation is different from weak bisim-ulation [33] or the equivalence of the languages of automata. The following examples illustrate the difference.

◦ ◦ ◦ ◦ ◦ ε ε

a b ∼ ◦ ◦ ◦ a b ◦ ◦ ◦ ◦ ◦ c c

a b 6∼ ◦ ◦ ◦ ◦ c a b

The bisimilarity of the first two examples shows the difference from weak bisimulation; recall thatε-edges represent shortcuts. The non-bisimilarity of the last two examples shows the difference from the equivalence of the trace sets, or the equivalence of automata.

g::={} | {l:g} |g1∪g2 (graph constructors)

|&x ⊲ g|&y|()|g1⊕g2

|g1@g2|cycle(g)

| srec(λ(l, t).g1)(g2) (structural recursion)

| t (graph variable)

l ::=l|a

Figure 3. The syntax of the positive subset of UnCAL

{} {a: G}

G

a

G

&x G

&x

()

&x1 ... &xk

&y1 ... &yn

&x’1 ... &x’m

&y1 ... &yn

G1 G2

G1♁G2 G1@G2

&x1 ... &xk

&y1 ... &ym

G1

&z1 ... &zn

&y1 ... &ym

G2

ε ε

cycle (G)

&x1 ... &xm

ε ε

G

&y

G1 ∪G2 G1 G2 &x1 ... &xm

&y1 ... &yn

Figure 4. Graph Constructors

The bisimilarity-based semantics of the UnCAL graphs sup-ports usual equational reasoning of recursive functions on graphs [8, 21]. The purpose of this paper is to further investigate this direc-tion and show a closer reladirec-tionship between UnCAL programs and functional ones.

3.2 Syntax and Semantics

Figure 3 shows the positive [8] subset of UnCAL that we mainly target in this paper. The subset of UnCAL consists of the nine graph constructors andsrecfor astructural recursion. Compared with full UnCAL [8], the subset does not containif-expressions and the

isEmptyoperator that checks if a graph has at least one non-ε -edge accessible from the roots or not. The former restriction is just for simplicity; we can extend our discussions straightforwardly to

if-expressions.

In contrast, we have to be more careful with isEmpty. As we will discuss in the end of Section 4, there is no computable counterpart ofisEmpty in generalfunctional programs, and it must be converted to an oracle. However, unlike the discussion in Section 4, the discussions in Sections 5, 6 and 7 can be easily

extended toisEmpty becauseisEmpty becomes computable

for the class of functional programs restricted by the type system in Section 6.

3.2.1 Graph Constructors

UnCAL has the nine graph constructors, {}, { : }, ∪, &x ⊲

, &y, (), ⊕, @, and cycle. Some of them are already

men-tioned in Section 1.1. A record notation shown in Section 1.1 such as {name:Alice,email:alice} is a syntax sugar for {name:{Alice:{}}} ∪ {email:{alice:{}}}. Figure 4 illus-trates their intuitive behaviors. In what follows, we introduce the formal definitions of these nine constructors each by each.

Singleton GraphThe expression{}constructs a (single) root-only graph, of which semanticsJ{}Kis defined by:

J{}K= ({v},∅,{&7→v},∅)

Herevis a fresh node.

(8)

JgK; formally, its semantics is defined by:

J{a:g}K= (V ∪ {u}, E∪ {(u,a, v)},{&7→u}, O)

where(V, E,{&7→v}, O) =JgK

Here,uis a fresh node.

Edge-set UnionThe expression g1 ∪g2 adds twoε-edges from

the new root to the roots of G1 and G2, whereG1 and G2

are evaluation results ofg1andg2, respectively. Formally, its

semantics is defined by:4

Jg1∪g2K= (V, E,{&7→v}, O1∪O2)

where (V1, E1, I1, O1) =Jg1K

(V2, E2, I2, O2) =Jg2K V =V1∪V2∪ {v}

E =E1∪E2∪ {(v, ε, I1(&)),(v, ε, I2(&))}

Here, v is a fresh node, and V1 and V2 are assumed to be

disjoint.

Named Hole The expression&yconstructs a graph with a single node marked with an output marker&y, of which semantics is defined by:

J&yK= ({v},∅,{&7→v},{(v,&y)})

Herevis a fresh node.

Naming RootThe expression&x ⊲ gnames the root ofJgKby&x, of which semantics is defined by:

J&x ⊲ gK= (V, E,{&x7→v}, O)

where(V, E,{&7→v}, O) =JgK

Unlike the original definition [8], we restrict that the input markers ofgmust be the singleton{&}for simplicity.

Root-set Union The expressiong1⊕g2combines two graphsJg1K

andJg2Kwith different sets of input markers, of which

seman-tics is defined by:

Jg1⊕g2K= (V1∪V2, E1∪E2, I1∪I2, O1∪O2)

(V2, E2, I2, O2) =Jg2K

Here, we assume thatV1 andV2are disjoint, and require that dom(I1)anddom(I2)are disjoint.

Empty Graph The expression()represents a graph with no nodes or edges, i.e.,

J()K= (∅,∅,∅,∅)

Plugging InThe expressiong1 @g2 replaces holes inJg1Kwith

roots ofJg2Kthat share the same names, of which semantics is

defined by:

Jg1@g2K= (V, E, I1, O2)

(V2, E2, I2, O2) =Jg2K V =V1∪V2

E=E1∪E2∪ {(v, ε, I2(&x))|(v,&x)∈O1}

Here, we assume that V1 and V2 are disjoint, and require ran(O1)⊆dom(I1).

Cycle The expression cycle(g) constructs cycles by replacing holes with roots in JgKthat share the same names, of which

4_{In the definition, we restrict that}_g

i(i= 1,2)has only one root while the

original definition allowsg1andg2to have multiple roots. This is handy

when we convert UnCAL to functional programs. This restriction does not

lose the expressive power; forg1andg2with input markers&x1, . . . ,&xn,

the originalg1∪g2can be rewritten as(&x1⊲((&x1@g1)∪(&x1@g2)))⊕

. . .⊕(&xn⊲((&xn@g1)∪(&xn@g2))), in which∪satisfies this restriction.

semantics is defined by:

JcyclegK= (V, E′, I,{(v,&x)∈O|&x6∈dom(I)})

where(V, E, I, O) =JgK

E′=E∪ {(v, ε, I(&x))|(v,&x)∈O}

Some examples have been shown already in Section 1. The fol-lowing is an alternative way to define the cyclic graph in Section 1.

&a@cycle((&a ⊲({name:{Alice:{}}} ∪ {friend:&b}))

⊕(&b ⊲({name:{Bob:{}}} ∪ {friend:&a})))

Structural Recursion The expressionsrec(λ(l, t).g)( )

repre-sents astructural recursionin the sense that a function f(x) =

srec(λ(l, t).g)(x)satisfies the following laws [8].

f({}) ={} (SR1)

f({a:G}) =g[a/l, G/t]@f(G) (SR2)

f(G1∪G2) =f(G1)∪f(G2) (SR3)

Thanks tosrec, UnCAL can express many graph transformations in an efficient way, with guarantee of termination [8, 23].

Formally, its semantics is defined by:

Jsrec(λ(l, t).g)(g′)K= (V′∪S

ζ∈EVζ, E′∪Sζ∈EEζ, {&x7→uv0,&x|&x∈Z},∅)

where

(V, E,{&7→v0},∅) =Jg′K

(Vζ, Eζ, Iζ, Oζ) =Jg[a/l,(V, E,{&7→v},∅)/t]K

(ζ= ( ,a, v))

V′={uv,&x|v∈V,&x∈Z}

E′=

(v′, ε, uv,&x)| ∃a, u.(v′,&x)∈O(u,a,v)

∪

(uv,&x, ε, v′)| ∃a, u. I(v,a,u)(&x) =v′

Here,V′are fresh nodes, andZ = dom(Iζ)andran(Oζ) ⊆ Z for eachζ ∈ E. We assume thatdom(Iζ) = dom(Iζ′) for all

ζ, ζ′ ∈ E, andVζ and Vζ′ are disjoint for different ζ, ζ′ ∈ E. Intuitively, the semantics computes g[a/l, G/t] for each edge in Jg′K, and connects them byε-edges which corresponds to (SR2) and (SR3). Unlike the original definition [8], we require the graph Jg′_K _{to have only one root named}_& _{and no holes. The former}

restriction is just for simplicity. For typed UnCAL [8], we can convert UnCAL programs to ones that satisfy the condition. In contrast, the latter restriction reduces the expressive power to some extent. However, UnCAL programs that violate the latter restriction are rare in practice. For example, UnCAL programs obtained from the surface languages UnQL and UnQL+satisfy the restriction.

3.3 Types

One would notice that there are some conditions on markers to perform some graph constructions such as∪. To guarantee these conditions, UnCAL has a type system concerning markers [6, 21], in which a graph type is of the formDBX

Y. A typeDBXY represents a set of the graphs whose input markers areexactlyX, and whose output markers arecontained inY. For example, expression&y

can have typeDBXY whereX ={&}andY ⊇ {&y}. Recall that

&is the marker to refer roots obtained from{},{ : }and&y. We

shall omit the typing rules, because it is straightforward and one can extract the typing rules from the conversion rules from UnCAL to functional programs, which will be shown in the next section.

(9)

e::=x|λx.e|e1e2 (λ-terms)

| πn

i e|(e1, . . . , en) (projections and tuples)

| a (labels)

| e1:e2|e1∪e2| • (graph constructors)

| fixGe (first-order fixed-point operator)

| foldne (structural recursion for graphs)

Figure 5. The syntax of FUnCAL, the target language of the trans-lation.

As mentioned in Section 1.1, we convert an UnCAL program that manipulates graphs to a functional one that manipulates in-finite trees, by emulating the UnCAL specific features, markers

and their manipulation, by standard notions in functional program-ming languages. Specifically, we emulate input markers—names for roots—by tuples, and output markers—names for holes—byλ -abstractions.

It is true that a translated functional program can be nontermi-nating; for example, an UnCAL expressioncycle(&)satisfying

Jcycle(&)K= ◦ _a_a ε ∼J{}K

is converted tofixG(λx.x) that diverges. However, this is rather natural and not problematic in the bisimulation-based reasoning, which will be shown in Section 4.2. Recall that, in process calculi, (strong or weak) bisimilarity cannot distinguish a terminating pro-cess from a nonterminating propro-cess if each of them does not interact other processes. How to execute the translated programs as graph transformations will be discussed in Sections 5, 6 and 7.

Recall thatgofsrec(. . .)(g)is restricted not to contain out-put markers. This is a key to regarding outout-put markers as holes. For example, in the original UnCAL without the restriction,

srec(λ(l, t).g′₎₍_&_y_{) =} _&_y _{holds for} _g′ _{with type}_DB{&}

{&}. This

behavior is different from that of holes; if the output markers are holes, a graph substituted to a hole must be traversed by thesrec.

4.1 Translation

The syntax of the target language of our translation is given in Fig-ure 5. We call the language FUnCAL. The target language contains

λ-expressions, tuples (whereπn

i is the projection of theith element from anntuple), (infinite) tree constructors, and the (first-order) fixed-point operatorfixG, and structural recursionsfoldnf. Tree constructors consist of a leaf•, edge extension(:), and branch con-struction∪. For simplicity, we shall writea:bfor{a:b} hence-forth. We usefixGinstead ofletrecthat appeared in Section 1.1, since it is handy to discuss reductions. The structural recursion

foldnfis “fold” for the tree constructors defined recursively as:

foldnf• = (•, . . . ,•) (Fold1)

foldnf(x:y) =f x(foldnf y) (Fold2)

foldnf(x∪y) = (foldnf x)∪(foldnf y) (Fold3)

Note that foldn returns an n-tuple of trees rather than a tree; in the right-hand side of (Fold3), we overload ∪ to tuples as

(x1, . . . , xn)∪(y1, . . . , yn) = (x1∪y1, . . . , xn∪yn). Unlike general “fold”, the operations for•and∪are fixed in the definition. As we have mentioned earlier, in our translation, we emulate input markers by tuples and output markers by λ-abstractions. Thus, an UnCAL expressiong::DBXY is translated to

e::G|Y|→G|X|

whereGis the (coinductive) datatype defined by:

dataG=• |L:G|G∪G

In this section, we assume that FUnCAL has the standard simple type system with the datatypeGand the label typeL; we later refine it to guarantee termination (Section 6). For example, an expression

λf.foldnfhas type(L→Gn→Gn)→G→Gn.

We introduce several notations. We sometimes shall write πi instead of πn

i if n is clear from the context. We assume that markers are totally ordered, and writeXfor a tuple(x1, . . . , xn) where{&x1, . . . ,&xn} = X and &xi < &xj (i < j). Some-times, we use a syntax sugarλX.eforλt.e[(πit)/xi]1≤i≤nwhere

(x1, . . . , xn) = X. For example, assumingx1 < x2, we write λ(x1, x2).x1forλt.π1t.

Our translation is defined according to the typing derivation of UnCAL. A translation judgmentΓ⊢g::DBX

Y e reads that

an expressiong of typeDBXY under a typing environment Γin UnCAL is converted to an expressione, where the type ofgand types inΓare converted fromDBXY′′ toG|

Y′_|

→G|X′|. Figure 6 shows the translation rules for the UnCAL graph constructors. If we ignore the❀ epart ofΓ⊢g::DBX

Y e, the judgment and

rules coincide to the typing judgment and rules of UnCAL [6, 21]. The conversion rule forsrecis a bit involved and thus is written separately as follows.

Γ, t::DB{_∅&}⊢g1::DBZZ e1 Γ⊢g2::DB{∅&} e2

Γ⊢srec(λ(l, t).g1)(g2) ::DBZ∅

λy.para_|Z|(λl.λt′.(λt.e1) (λ().t′)) (e2())

C-REC

Here,parane, representing a “paramorphism”5[31], where an ex-pressionλf.paranfhas type(L→G→G

n

→Gn)→G→Gn, is a syntax sugar defined by:

parane y≡p

foldn+1

λz.λx.q e z(p′_x_{) (}_{p x}₎

z:p′_x y

wherep::Gn+1→Gn,p′_::_Gn

→Gandq::Gn→G→Gn+1

are functions to rearrange tuples defined asp x= (π1x, . . . , πnx),

p′ _x ₌ _π

n+1x, andq x y = (π1x, . . . , πnx, y). This definition ofparan is similar to how a paramorphism is represented by a catamorphism (fold) via tupling [31]. The rule C-REC becomes a bit complicated due to explicit conversion betweenGand()→G. Since the argument ofpara_nmust be of typeGinstead of()→G, we apply()toe2. In addition, since the conversion assumes thatt

ine1has type()→G, we construct such a function byλ().t′.

For example, cycle({a:&}) of type DB{_∅&} is converted to

λy.fixG(λx.a : x) of type () → G after some simplification based on the standardβandηconversions. An UnCAL expression

srec(λ(l, g).&)(cycle({b:&}))is converted toλy.fold1(λl.λr.r)(fixG(λx.b:x))

after some simplification.

It is not difficult to show the translated programs are well-typed.

4.2 Correctness

Although the translation is rather simple, some extra effort is re-quired to state its correctness; we have to be careful with the follow-ing difference between UnCAL and FUnCAL: An UnCAL graph of typeDBXY, which can contain output markers inY, is translated to a tree-to-treefunctionG|Y|→G|X|in FUnCAL rather than an expression that generates a (tuple of) tree. To leap the gap, we first define a relation between output-marker-free UnCAL graphs and FUnCAL tree expressions, and then we extend the relation to one that between general UnCAL graphs and FUnCAL functions.

First, we define a graph obtained from an expression as a labeled transition system [33].

Definition 1. A reduction graph Ge0,X of a (possibly-open)

FUnCAL expressione0of typeGnand markersX={&x1, . . . ,&xn}

5_{Precisely, paramorphism is a notion for inductive datatypes. We borrow}

(10)

Γ⊢ {}::DB{_Y&} •C-SINGLE

Γ⊢g::DB{_Y&} e

Γ⊢ {l:g}::DB_Y{&} λy.l: (e y)C-EDGE

Γ⊢gi::DB&Y ei _i₌₁_,₂

Γ⊢g1∪g2::DB&Y λy.(e1y)∪(e2y)

C-UNI

Γ⊢g::DB{_Y&} e

Γ⊢&x ⊲ g::DB{_Y&x} e

C-ROOT

&y=&yiof(&y1, . . . ,&yn) =Y

Γ⊢&y::DB{_Y&} λz.πiz

C-HOLE

Γ⊢() ::DB∅Y λy.() C-EMP

{Γ⊢gi::DBXYi ei}i=1,2 p≡λX1.λX2.X1∪X2

Γ⊢g1⊕g2::DBYX1⊎X2 λy.p(e1y) (e2y)

C-RU

Γ⊢g1::DBXY e1 Γ⊢g2::DBYZ e2

Γ⊢g1@g2::DBXZ λy.e1(e2y)

C-SUBST

Γ⊢g::DBX X⊎Y e

Γ⊢cycle(g) ::DBYX λY .fixG(X.e X∪Y) C-CYC

Γ⊢y:: Γ(y) yC-VAR Γ⊢g::DBXY e Y ⊆Y′

Γ⊢g::DBXY′ λY′.e Y

C-SUB

Figure 6. Conversion rules of UnCAL graph-constructors.

with &xi < &xj (i < j) is an (possibly-infinite) UnCAL

graph(V, E, I,∅) where V is the set of FUnCAL expressions,

E = {(e,a, e′)|e⇒∗_a_:_e′_}_{, and}_I ₌ _{&_x

1 7→ π1e0, . . . ,

&xn7→πne0}. Here, the relation(⇒)is defined by:

e1∪e2⇒e1 e1∪e2⇒e2 e⇒e′ife→e′

where,→is the call-by-name reduction.

Note that, in each reduction by⇒, only(∪)occurring at the top is interpreted as nondeterministic choice. We writeGeforGe,Xif

Xis clear from the context or not relevant in the context. We abuse the notation to writeG∼eande∼e′forG∼GeandGe∼Ge′, respectively. Note thatGsuch thatG∼efor someemust not have output markers. By definition, ifeande′are equivalent as infinite constructor trees (i.e.,∪is frozen), thene∼e′.

Then, we define a correspondence (≈) between an UnCAL graphG::DBXY and an expressione::G|

Y| _→_G|X|_by:_G_≈_e

iff(G@(G1⊕. . .⊕G|Y|))∼(e(e1, . . . , e|Y|))for anyGj, ej

(1≤j≤ |Y|)such thatGj::DB&∅andGj∼ej. Now, we have the following theorem.

Theorem 1(Correctness). If⊢g::DBYX e,JgK≈eholds.

Proof. See Appendix A.

4.3 Translation ofisEmpty

The full-set of UnCAL containsisEmptyas we have mentioned before. Although many transformation can be described without

isEmpty[8] as those obtained from UnQL, there are still useful transformations that requireisEmpty; for example, some UnCAL programs converted from UnQL+, such as Class2RDBin Sec-tion 2.3, containisEmpty[23].

Since a graph is translated to an infinite tree, it is natural that

isEmptyis translated to the productivity test that checks whether

esatisfiese ⇒∗ a : e′ for some aand e′, which is generally undecidable. In other words,isEmptyis translated to an oracle instead of a computable function according to our translation. This is the reason why we consider the positive subset of UnCAL. For the positive subset of UnCAL, we can reason about the UnCAL programs through translated functional programs. For the full-set of UnCAL, additional reasoning effort is needed to handle the productivity-test oracle, although special treatment of markers are not necessary in reasoning of the translated programs.

However, isEmpty does not pose any problems when we

execute UnCAL programs as functional ones (Sections 5, 6, and 7). Roughly speaking, the type system in Section 6 ensures that the productivity test isdecidablefor the well-typed programs.

5. Evaluating Functional Programs as

Finite-Graph Transformations

The translation in Section 4 enables us to reason about UnCAL pro-grams as functional ones; for example, we can apply verification techniques studied for functional programs to UnCAL ones (Sec-tion 2.1). However, the translated programs are not graph transfor-mations; i.e., according to the usual semantics, they may result in infinite trees rather than finite graphs.

In this section, we give a semantics based on Nakata and Hasegawa [34]’s lazy semantics, which extends Launchbury [28]’s natural semantics with the black hole[2, 3, 34] (“apparent un-definedness”), so that a FUnCAL program runs as a finite-graph transformation. This section focuses on the formal description of the semantics. Formal discussions about termination will be post-poned to Sections 6 and 7.

The basic idea is to exploit a pointer-structure in a heap under the lazy evaluation. For example, the heap obtained after eval-uation of fixG(λx.a:x) in the usual lazy evaluation is cyclic and has a similar structure to a corresponding UnCAL graph Jcycle({a:&})K. However, an extra effort is required to handle

fixG(λx.x)andfold(λa.λr.r)(e)for example, which are nonter-minating in usual semantics.

The lazy semantics with the black hole [34] plays an impor-tant role to resolve the problem. SincefixG(λx.x) evaluates to the black hole without running infinitely in the lazy semantics, we identify the black hole with a singleton graph, and obtain a singleton graph as the evaluation result offixG(λx.x). Still, the semantics is not sufficient for terminating evaluation of recur-sions such as fold(λa.λr.r)(e). To make the evaluation of re-cursions terminating, we adoptmemoization. Roughly speaking, sinceeoffold(λa.λr.r)(e)represents a graph, the recursive call offold(λa.λr.r)must take the same argument twice in the evalua-tion. Memoization is used to detect the situation, and make the call result in the black hole.

5.1 Modified Syntax with Memos

As mentioned above, we adopt memoization for structural recur-sions. Accordingly, the syntax of FUnCAL is changed as

e::=· · · |foldMe

wherefoldne is replaced withfoldMein whichM represents a memo. The memosM are all∅initially, and entries are added through evaluation. Although tuples and projections are important in previous sections, we shall ignore them henceforth because they are not relevant in our technical development in the following sec-tions; our discussions can be extended to them straightforwardly. This is the reason whyfoldMeabove does not have the subscript.

(11)

hE[•e2]|µi → hE[•]|µi

hE[Ce1e2]|µi → hE[Cx1x2]|x1=e1, x2=e2, µi

(x1, x2: fresh)

hE[fixGe]|µi → hE[w]|w=e w, µi (w: fresh)

hE[foldM_e

•]|µi → hE[•]|µi

hE[foldMe v]|µi → hE[w]|w=e x1(foldM

′

e x2), µi

(M′₌_M_[_v_7→_w_]_{, w}_{: fresh}₎

ifv=x1:x2, v6∈dom(M)

hE[foldMe v]|µi → hE[w]|w=foldM′e x1∪foldM

′

e x2, µi

(M′₌_M_[_v_7→_w_]_{, w}_{: fresh}₎

ifv=x1∪x2, v6∈dom(M)

hE[foldM_{e v}_]_|_µ_{i → h}_E_[_w_]_|_µ_i ifM(v) =w

Figure 7. Reduction rules of our lazy abstract machine: we assume that(λx.e)is α-renamed so that the variables introduced in the right-hand sides are fresh.

5.2 Semantics

Now, we describe our lazy semantics to execute FUnCAL programs as graph transformations.

Avaluev, or weak head normal form, is defined by:

v::=a|Cx1x2|λx.e|foldMe| •.

That is, a value is either of a label, an expression guarded by a constructorCwherex1 andx2refer some expressions via a heap

introduced later, a function, a memoized recursion, or the black hole. Anevaluation contextEis defined by:

E::=✷|E e|(foldMe)E|x:=E

An evaluation contextx:=✷is characteristic in lazy evaluation, which represents heap-update after the expression referred by x

becomes a value. We writeE[e]for an expression obtained from

Eby replacing✷withe. Note that(foldMe)Eabove means that

foldM_e

is strict. Aheapis a mapping from variables to expressions. A configuration is a pair hx|µi where x is a variable to hold an expression to be evaluated in µand µis a heap. We assume that configurations are closed; hx|µi is closed if x and every variable occurring in the right-hand sides ofµalso occur in a left-hand side ofµ. One can think thathx|x1=e1, . . . , xn=enias

letrec x1 = e1 and . . . and xn = en in x, but we use different notation to avoid confusion with our restricted forms of recursive definitions such asfixG andfold. We sometimes write

he|µiforhx|x=e, µiwhere we do not carex. We assume that expressions are appropriatelyα-renamed to avoid capturing; more explicit treatment of variable renaming can be found in [34].

The reduction relation hx|µi → hx|µ′_i _{is defined by the}

rules in Figure 7. The rules except the ones forfoldMe are just straightforward extensions of [34] with constructors.

The black hole•represents the apparent undefinedness yielded when the value ofxis required by the evaluation ofxitself [34]. A typical example isfixG(λx.x), which will be evaluated as:

hfixG(λx.x)| ∅i → hw|w= (λx.x)wi

→ hw:= (λx.x)w|w=•i → hw:=x|w=•, x=wi → hw:= (x:=w)|w=•, x=•i →∗_h_w_{:= (}_x_:=_•₎_|_w₌_•_{, x}₌_•i

→∗_{h• |}_w₌_•_{, x}₌_•i

In contrast,fixG(λx.a:x)does not lead to•because of the lazy semantics; recall that the values are weak head normal forms.

hfixG(λx.a:x)| ∅i

→ hw|w= (λx.a:x)wi → hw:= ((λx.a:x)w)|w=•i → hw:=a:x|x=w, w=•i

→ hw:=a′:x′|a′=a, x′=x, x=w, w=•i → ha′_:_x′_|_a′₌_a_{, x}′₌_{x, x}₌_{w, w}₌_a′_:_x′_i

The reduction rules for foldMe are keys in this semantics. It basically works asfold1ein Section 4.1; indeed, the first, the

sec-ond and the third rules correspsec-ond to (Fold1), (Fold2) and (Fold3), respectively. The first rule also says that•can be seen as an ex-ception. The second and the third rules update the memo, and the fourth rule offoldMelooks up the memo if there is corresponding entry in the memo.

Thanks to memoization,fold∅(λz.λr.r) (fixG(λx.a: x)), the problematic example shown in Section 1, evaluates to•without major changes to the original semantics [34]. For an illustration, see Example 1 below.

Example 1(Eliminate All Edges). Let us consider the expression

fold∅(λz.λr.r) (fixG(λx.a:x))

Here,fold∅(λz.λr.r)eliminates all the edges, and thus we expect that the expression results in•(a singleton graph). Let us write

elMforfoldM(λz.λr.r), then the above expression is evaluated as follows.

hel∅(fixG(λx.a:x))| ∅i

→∗_h_el

∅(a′:x′)|a′=a, x′=x, x=w, w=a′:x′, . . .i

→ hu|u= (λz.λr.r)a′₍_el

{(a′:x′)7→u}x′), . . .i

→∗_h_u_:=_el

{(a′:x′)7→u}x′|u=•, . . .i

→∗_h_u_:=_el

{(a′:x′)7→u}(a′:x′)|u=•, . . .i

→ hu:=u|u=•, . . .i {hit!} → h• |. . .i

Here, we underlined the configurations whereelM inspected the memoM. Memoization plays an important role to achieve the intuitive result. At the first-underlined reduction ofelM, the entry

(a′ :x′)7→uis added toM. At the second-underlined reduction ofelM, sinceM(a′ : x′) = uholds, the call is reduced to the variableu. Note that, in the evaluation ofuafter the first-underlined reduction ofelM, the referred value was replaced with•. Thus, we got•as the final result.

An important property on memos is that, if looking-up succeeds, then the looked-up object must be a value, which will be used in Section 7.

Lemma 1(Look-up). Suppose thathe| ∅i →∗ _h_E_[_foldM_{e v}_] |µi whereM(v) =x. Then,µ(x)is a value.

5.3 Extracting Graphs

As mentioned early in this section, we obtain a graph from an evaluation result as the “graph” structure of a heap. For example, for a configurationhx|x=a:x, a=ai, we obtain a graphG= (V, E, I, O) withV = {x}, E = {(x,a, x)}, I = {&7→x} and O = ∅. In general, since a heap may contain unevaluated expressions, we have to evaluate them to extract a graph from the heap. Here, we formally describe how to extract a graph from a heap. In this subsection and in Section 7, we shall omit(∪)for simplicity.

(12)

xisaccessiblefromyinµwhen(x, y)in the reflexive transitive closure of the{(z, w)|w∈fv(µ(z))}.

For a configurationhx|µi, this deep evaluation is easily done by evaluatinghelim∅x|µiwhereelimM =foldM(λa.λr.ifa=a

thenrelser). An applicationelim∅xeliminates all the edges

fromx, and thus it results in•if terminates. Ifhelim∅x|µi →∗

h• |µ′i, thenhx|µiandhx|µ′iare “bisimilar”, andhx|µ′i satis-fies the required condition above.

Now, let us define how to extract a graph from a heap. Let

e0 be a closed expression of typeG, and suppose that we have

helim∅x0|x0=e0i →∗ h• |µi. Then, from the discussions

above, µ(x) is a value for everyxaccessible from x0. We can

easily construct a graph from such a heap bygraphify(x0, e0)

de-fined as follows.

graphify(x0, e0) = (V, E,{&7→x0},∅)

where V =accessible variables fromxinµ E=S

x=(x1:x2)∈µ,x∈V{(x, µ(x1), x2)} helim∅x0|x0=e0i →∗h• |µi

Note thatµ(x1)above is a value, more concretely a label literal.

Thus, the termination under the contextelim∅✷means that an

expression corresponds to a finite graph.

Treatment of(∪). If we have(∪), it suffices to use a context

isEmpty∅(elim∅✷)instead ofelim∅✷, whereisEmptyMis a

mem-oized FUnCAL version ofisEmpty. If we only allowisEmpty∅to appear the outermost context, only an extra effort to prove the ter-mination is one more case analysis added to the proof of Lemma 9. With (∪), the definition ofgraphify(x0, e0) is changed

accord-ingly; specifically, the definition ofE is changed toE = · · · ∪

S

x=(x1∪x2)∈µ,x∈V{(x, ε, x1),(x, ε, x2)}.

The next theorem states that the two semantics (the call-by-name and the lazy abstract machine) coincide.

Theorem 2. Ifgraphify(x, e) =G,GeandGare bisimilar.

Proof (Sketch). We extend the notation of the reduction graph to the configurations, and show that reductions of the abstract ma-chine preserves the reduction graphs (up to bisimilarity). Forhx|µi whereµ(y)is a value for anyyaccessible fromx, it is easy to prove that the reduction graph and the result ofgraphifyare bisimilar.

Remark. This construction of a graph from a configuration runs

in time linear to the heap size. This efficient construction is achieved byε-edges that postpone∪-operation. To obtainε-free graphs, we have to pay a similar cost toε-elimination in automata, i.e., cubic time to the number of nodes (= the size of the heap).

Together with Lemma 1, the following lemma says that growth of memoM ofelimMdoes not change the termination, which will be used in Section 7.

Lemma 2(Memo and Termination). Ifhelim∅e|µi terminates,

thenhelimMe|µialso terminates for anyMsuch thatµ(M(v))

is a value for allv∈dom(M).

6. Type System

In this section, we describe the type system that guarantees termi-nation of→under the contextelim∅✷; that is, well-typed

expres-sions are finite-graph transformations. We shall only show the type system in this section; termination will be discussed in Section 7.

6.1 Idea

In advance to the formal definition of our type system, we discuss a problematic example we want to exclude to explain the underlying

idea of the type system. Consider the expressionaInB bswhere

bs =fixG(λx.b:x)

aInB =fold(λz.λr.z:insAr) insA =fold(λz.λr.a:z:r).

(Here, we ignore memos for a while.) One might notice thatinsA

is applied to a variablerthat holds the result of the recursive call ofaInB. The expression evaluates to a nonregular tree as

aInB bs →∗

b:insA(aInB bs)

→∗_b_:_insA₍_b_:_insA₍_{aInB bs}₎₎

→∗_b_:_a_:_b_:_insA2₍_{aInB bs}₎

→∗_b_:_a_:_b_:_a_:_a_:_a_:_b_:_insA3₍_{aInB bs}₎

→∗_b_:_a_:_b_:_a_:_a_:_a_:_b_:_{· · ·}_:_insAn₍_{aInB bs}₎ and thus must not correspond to a finite graph. Here, one can find that the number of nested applications ofinsAincreases in the eval-uation, which leads to this nonregularity and thus nontermination ofelim∅(aInB bs). In contrast, such nonregularity does not arise

for functionsinsAitself andel in Example 1. For example, if we apply them tobsabove, we have

el bs→∗_{el bs} _{insA bs}_→∗_a_:_b_:_{insA bs}

Thanks to this looping structure,elim∅(el bs)andelim∅(insA bs)

terminate with memoization.

To exclude such a problematic case, we use a modal type system with modality ❞_{that represents “already constructed (and thus}

regular)”, and restrict that the argument offoldfto be a tree that is already constructed and regular. For an expressioneof type ❞_G_,

since we know thateis evaluate to a regular tree, the application offoldftoeterminates, asinsA bswhere we know the regularity ofbsbeforehandthe application. However, for an expressioneof typeG, application offoldfis not always terminating because its results can be referred in theeby some outer-contexts, asinsA bs

and its simpler versionfixG(λx.b:insAx). Here we cannot know the regularity of the argument because the tree is being constructed.

6.2 Modal Types

Atypeτis defined as follows.

τ::=G |τ1→τ2|L G::=G| ❞G

Types consist of graph types with modality (G), function types (→) and the label type (L). As explained above, the modality ❞

represents “already constructed”. For example, the argument of

foldmust has type ❞_G_{to produce a graph of type}_G_{. It is natural}

to have a subtyping relation ❞_{G G}_{, which means that a graph}

constructed in some period is still available after the period. The structural subtyping rules forare standard ones and we shall omit them.

Figure 8 shows the typing rules. The typing rules for variables andλs are standard ones. The rules T-CONS, T-CHOICEand T-FIX

says that graph constructors construct a graph in a period from graphs in the same period. The rule T-BH says that•is something similar to an exception. The rule T-CATA is special in our type system. The rule requires that the argument of fold must be a graph that is constructed beforehand; for such a tree, we can use its finiteness and guarantee the termination of the application. In the premise of T-CATA, since the memo maps arguments offoldM_e

to their return values,vmust be of type ❞_G_and_M₍_v₎_{must be of type} G.

Our type system can be easily extended to configurations, and we can easily prove that the preservation and the progress property. An important property for our purpose is that functional pro-grams translated from UnCAL as in Section 4 are well-typed.

Theorem 3. If ⊢g::DBX

Y e, then ⊢ e :: (❞nG)|Y| →

(13)

Γ(x) =τ Γ⊢x::τT-VAR

Γ, x::τ1⊢e::τ2

Γ⊢λx.e::τ1→τ2T-ABS

Γ⊢e1::τ′→τ Γ⊢e2::τ′

Γ⊢e1e2::τ

T-APP

Γ⊢a::LT-LABEL

Γ⊢e1::L Γ⊢e2::G

Γ⊢e1:e2::G T-CONS

{Γ⊢ei::G}i=1,2

Γ⊢e1∪e2::G

T-CHOICE

Γ⊢ •::τT-BH Γ⊢e::G → G

Γ⊢fixGe::G T-F

IX

Γ⊢e::L→ G → G

{Γ⊢v:: ❞_{G ∧}_Γ_⊢_x_::_{G |}_v_∈_dom₍_M₎_{, x}₌_M₍_v₎_}

Γ⊢foldMe:: ❞_{G → G} T-CATA

Γ⊢e::τ′ _τ′_τ

Γ⊢e::τ T-SUB

Figure 8. Typing rules for termination.

Proof. See Appendix B.

We state that the typed expressions respect bisimulation; in other words, a typed expression cannot distinguish bisimilar ex-pressions (in the sense of Section 4.2).

Theorem 4. Suppose⊢f::G → G′_{. For any}_e

1ande2such that

⊢ei::G(i= 1,2), ife1∼e2, thenf e1∼f e2.

Proof. See Appendix C.

One might think that it would suffice to have only two graph types ❞_G_and_G_{instead of having many graph types} ❞n_G_would

suffice because ❞_{represents capability to be an argument of}_fold_,

and have a type rule fold :: (L → G → G) → ❞_G _→ _G_.

However, this prohibits the composition offold, which means that we cannot express what the original UnCAL can express. One then might think that we could use the rulefold:: (L→G →G) → ❞_G_→ ❞_G_{instead. However, this violates the type safety; note that}

we haveλx.fold(λa.λr.x) (Dummy:•) ::G→ ❞_G_.

7. Soundness of the Type System

In this section, we prove that every expressione of typeG rep-resents a finite graph; i.e., the evaluation ofelim∅eterminates and

thusgraphifysucceeds. To simplify our discussion, as wrote in Sec-tion 5.3, we ignore(∪)in this section.

Formally, we prove the following property:

Theorem 5(Soundness). If⊢ e ::G, thenhelim∅x|x=ei →∗

h• |µi.

We will prove the theorem by using logical relation [39].

7.1 Modification to Type System

Recall that we have said that our type system can be extended to heaps and configurations; a solution would be defining that

µ is well-typed under Γif ∀x.Γ ⊢ µ(x) :: Γ(x). However, typing derivations could be cyclic viaµ in this naive approach. This prevents us from using the logical-relation based termination proof as in the simply-typed λ-calculus [39]. In other words, a configurationhx|µiis essentially cyclic [2].

To overcome the problem, we parameterize a type system by a heap. In analogy with the lazy evaluation strategy in which every variable is evaluated at most once, every variable is dereferenced at

most once in typing. This idea is realized by splitting T-VARinto the following two rules.

Γ(x) =τ Γ⊢µx::τ

T-VAR Γ⊢µ,x=•e::τ

Γ⊢µ,x=ex::τ

T-VARREC

Here, we assume thatdom(µ) and dom(Γ) are disjoint, which means that there is no nondeterminism to apply T-VARor T-VARREC. The other rules remain unchanged.

The following fact says that an expression and a heap that are typable in the original type system are also typable in the parameterized type system.

Lemma 3. If we haveΓ,∆⊢e::τandΓ,∆⊢µ(x) :: ∆(x)for anyxindom(∆), thenΓ⊢µe::τholds.

We define asubstitutionσas a mapping from variables to ex-pressions of which domain is finite. We writetσfor the application of the substitutionσto an expression/heapt.

The following lemma says that an evaluation of an expression of typeGcannot “observe” graphs of typeGbecause of the modality restriction; recall thatfoldis the only language construct that can observe graphs.

Lemma 4. LetZ = {z1, . . . , zk}. Suppose z1 :: G, . . . , zk :: G ⊢µ e :: G. Then, ifhelimMeσ|µσi terminates for someσ :

Z → VM, thenhelimMeσ′|µσ′iterminates for allσ′ :: Z →

VMwhereVM =dom(M)∪ {•}.

Proof.See Appendix D.

7.2 Logical Relation

Then, we define a logical relationR.

Definition 2(RelationR). The unary relationRτ on configura-tions is defined as follows.

• he|µi ∈ RLiffhe|µi →∗hv|µ′ifor somevandµ′.

• _he|µi ∈ RGiffhelim_∅e|µi →∗h• |µ′ifor someµ′.

• he|µi ∈ R❜Giffhe|µi ∈ RG.

• _hf|µi ∈ Rτ1→τ2 iffhf|µi →

∗

hv|µ′ifor somevandµ′, andhf e|µ∪ηi ∈ Rτ2for anyhe|ηi ∈ Rτ1.

For heaps µandµ′, we writeµ∪µ′ for the union ofµ and

µ′, assuming thatµ(x) =µ′(x)for anyx∈dom(µ)∩dom(µ′). Intuitively,R(τ) defines pairs of expressions and heaps that are “meaningful” asfinite-graphtransformations. Especially,he|µi ∈ RG means that he|µi corresponds to a finite graph. Note that,

thanks toelim∅in the definition, we haveaInB bs6∈ RGforbs=

fixG(λx.b:x)while the evaluation ofaInB bsitself terminates. Also note that we haveRG =R❜Gbecause⊢µe::Gif and only if⊢µe:: ❞G; the modality is used only to prove Lemma 4.

In the later proof, we will use the following properties onR. Lemma 5 says that “garbage cells” in the heap does not affect termination. Lemma 6 says that one-step reduction does not change the termination property, which is rather obvious.

Lemma 5. he|µi ∈ Rτimplieshe|µ∪µ′i ∈ Rτ.

Lemma 6(Preservation under Reduction). Suppose⊢µE[e] :: τ

andhE[e]|µi → hE′[e′]|µ′i. Then,hE[e]|µi ∈ Rτ if and only

ifhE′[e′]|µ′i ∈ Rτ.

7.3 Lemmas for Recursive Definitions

In advance to the proof of the termination, we prove some lemmas statingRis preserved in our recursive definitions.

The following lemma intuitively says that typedfixGe expres-sions corresponds to a finite graph ifepreserves finiteness.

Lemma 7(fixG). Suppose⊢µe0 ::G → Gandhe0e′|µ∪µ′i ∈